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(54) Method for generating a computer file 

(57) The invention relates to means for generating 
a computer file comprising a data field and a header. 
The invention is characterized in that in the header is 



stored identification information other than the informa- 
tion required in the presentation of any data stored in 
the data field on a computer display screen. 
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Description 

The present invention relates to means for gener- 
ating a computer file. 

Computer software generates files, which generally 
take the form of digital representations of information. 
For example, when a user stores a document using 
word-processing software, a file is generated, which 
contains the document. 

The software generally adds a "header" to the doc- 
ument itself. An example will illustrate. 

Assume that a user generates the following docu- 
ment: 

When buying coconuts, make sure that they are 
crack-free and have no mould on them. Shake them to 
make sure that they are heavy with water. Now hold co- 
conut in one hand over a sink and hit it around the center 
with the claw end of a hammer. 

When this document is stored as a binary file, each 
character (ie, the "W", "h - , "e", "n", and so on) is as- 
signed an eight-bit code. One type of assignment is 
shown in Figure 1, which illustrates a version of ASCII 
code. (As stated above, the file is generally stored in 
binary format. Hexadecimal, not binary, representation 
is used in Figure 1 , for simplicity.) 

When this file is stored in the normal manner by a 
word-processing program, a header is added, as shown 
in Figure 2. The particular header shown was added by 
the software package known as WordPerfect, available 
from WordPerfect Corporation, Orem, Utah. The header 
contains data such as the following: 

Identification of the generating software ("WPC" in 
the header); 

Identification of the printer selected by the user dur- 
ing generation of the document ("HP LaserJet Se- 
ries II"); 

Identification of the font selected by the user ("Cou- 
rier"); 

Identification of the pitch of the font ("10 pitch"); 

as well as other information. Therefore, from one per- 
spective, a computer file contains two types of informa- 
tion: (1) data created by the user and (2) header infor- 
mation, created by software operated by the user. 

An object of the invention is to provide a means for 
producing file having a structure which contains infor- 
mation which can be used by apparatus which process- 
es the file, so as to increase the ease of using the file in 
different computer systems. 

According to a first aspect of the present invention 
there is provided a means for generating a computer file, 
characterized in that said file comprises a data field and 
a header in which is stored identification information oth- 
er than the information required in the presentation of 
any data stored in said data field on a computer display 
screen. 

According to a second aspect of the present inven- 



tion there is provided a method of processing a compu- 
ter file, comprising the following steps: 

a) maintaining, in association with the file, informa- 
5 tion indicative of two or more of the following: 

i) natural language of the file; 

ii) type of encryption of the file; 

iii) type of compression of the file; 

10 iv) identity of the program which generated the 

file; 

v) identity of the operating system upon which 
the generating program ran; and 

vi) communication protocol required by the file; 

75 

b) reading said information and calling respective 
programs which process the file in accordance with 
the information. 

20 Embodiments of the present invention will now be 
described by way of example, with reference to the ac- 
companying drawings, in which:- 

Fig. 1 illustrates a hex dump of a text file, together 

25 with hex character equivalents; 

Fig. 2 illustrates a hex dump of a word-processing 
file, together with hex character equivalents; 
Fig. 3 illustrates one form of the invention; 
Fig. 4 illustrates apparatus which can use the infor- 

30 mation shown in Fig. 3 for processing the FILE DA- 
TA; 

Fig. 5 illustrates how certain information of Fig. 3 
can be stored physically within a file; 
Fig. 6 illustrates another form of the invention; and 
3S Fig. 7 illustrates yet another form of the invention. 

Figure 3 shows one form of a computer file pro- 
duced by a means in accordance with the invention. Six 
blocks, labeled 1 through 6, have been added to a nor- 
40 mal, prior-art, file which includes two components: (1) 
PRIOR ART HEADER INFORMATION and (2) FILE DA- 
TA (also called user data). 

The PRIOR ART HEADER INFORMATION corre- 
sponds, in principle, to the header shown in Figure 2, 
45 and is generated by the software used to create the file. 
The user of the software, in general, is not involved with, 
and is not interested in, this HEADER. 

The FILE DATA contains the information generated 
by the user, such as the phrase "When buying 
so coconuts . . ." indicated in Figure 2. The FILE DATA is 
the information generated by the user. 

For example, in the case of a word-processing pro- 
gram, the FILE DATA corresponds to the document gen- 
erated by the user. As another example, in the case of 
55 a drafting program, the FILE DATA corresponds to the 
drawing generated by the user. 

(In practice, a "system header" is also created, but 
is not shown. The system header is generated by, and 
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used by, the operating system, which is the physical 
agent which actually writes the bits, which make up the 
file, onto disc.) 

Six Types of Added Information 

The six added blocks of Figure 3 contain the follow- 
ing information. Block 1 contains information as to the 
natural language, if any, of the FILE DATA. Natural lan- 
guage refers to a human language, such as English or 
French, rather than another type of language, such as 
FORTRAN. 

Block 2 contains information about the type of en- 
cryption, if any, which the FILE DATA has experienced, 
or should experience. 

Block 3 contains information about the type of com- 
pression, if any, which the FILE data has experienced, 
or should experience. 

Block 4 contains information about the identity of 
the program which generated the FILE DATA, such as 
the name of the particular program generating the file, 
and the release number, or version number, of the pro- 
gram. (Programs continually undergo revision, and the 
revisions themselves are continually revised.) 

Block 5 contains information about the system upon 
which the generating program, identified in block 4, ran. 
Examples of different systems are Macintosh, IBM PC, 
Sun Workstation, Unix, etc. This information is relevant 
to the system header, discussed below. Block 5 may al- 
so contain information about the operating system upon 
which the generating program ran, such as DOS version 
6.0. 

Block 6 contains information needed by communi- 
cation protocols, which may be used subsequently to 
transmit the file electronically from one location to an- 
other. As an example, some communications protocols 
transmit only seven-bit information. With such protocols, 
eight-bit data, such as compiled computer programs 
and some type of bit-mapped data, cannot be transmit- 
ted. Block 6 in Figure 3 would indicate the type of data 
(eight-bit or seven-bit) contained in the file, and thus in- 
dicate which protocols are needed. 

Some examples of protocols are the commercial 
standards ISDN, TCP/IP, and ATM. 

How Six Types are Used 

The information contained within the six blocks is 
used by the systems shown in Figure 4. System 10 
reads block 1 , and calls an appropriate language trans- 
lation program. For example, if the language indicated 
in block 1 is German, then system 10 knows that one of 
the translation programs in group 12 will be used. The 
translation will be from German, to another language. If 
the language is English, then system 10 knows that one 
of the translation programs in group 14 will be used. 
(The particular program in each group which will be se- 
lected will depend upon the language into which the file 



is to be translated, and is not a concern of the present 
invention.) 

System 20 reads the information of block 2, and 
calls an appropriate encryption, or de-encryption, pro- 
^ gram. 

System 30 reads the information of block 3, and 
calls an appropriate compression, or de-compression, 
program. 

System 40 reads the information of block 4, and 
10 calls an appropriate table which contains relevant char- 
acteristics of the generating program. For example, if 
the generating program is WordPerfect, system 40 calls 
a table containing information needed to translate, or 
use, the header information shown in Figure 2. 

System 50 reads the information of block 5, and 
calls an appropriate table which contains relevant char- 
acteristics of the system upon which the generating pro- 
gram ran. This table contains, among other things, in- 
formation about the system header of the file. 

System 60 reads the information of block 6, and re- 
sponds appropriately. 

How to Store the Six Types of Data 

How the data of the six blocks is associated with the 
prior-art file (ie, the FILE DATA and PRIOR ART HEAD- 
ER INFORMATION of Figure 3) will now be discussed. 

First Approach 

The six blocks can be added to a file as shown in 
Figure 5, which shows six added fields, labeled ADDED 
INFORMATION. Each horizontal line in the fields indi- 
cates one character. Each field may be identified by a 
code, such as NL for Natural Language, EN for Encryp- 
tion, and so on. 

However, it is possible that adding information to the 
header of a file in this manner will disrupt operation of 
the generating program. For example, a word-process- 
ing program may expect a specific number, and type, of 
fields within the HEADER. If the program sees added 
fields, it may not recognize the file, and issue an error 
warning. 

Consequently, after the six fields have been added, 
it is possible that programs intended to retrieve and 
process the Fl LE DATA must be specifically re-designed 
to negotiate the six added fields. Designing programs to 
accomplish this is within normal programmer's skill, but 
it is possible that many existing programs are not so de- 
signed. It is also possible that software developers may 
be resistant to undertake such re-design. 

Second Approach 

To avoid this problem, the header of the generating 
program can be modified to contain an empty region into 
which a user can insert data, and which the generating 
program does not use. For example, as shown in Figure 
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6, a file contains a SYSTEM HEADER and a header 
generated by the generating program, labeled APPLI- 
CATION PROGRAM HEADER, as usual. In addition, 
contained within the APPLICATION PROGRAM HEAD- 
ER are contained two delimiters, such as AAAA and 
ZZZZ, as indicated. The user is allowed to insert data 
between these delimiters. In Figure 6, the six blocks of 
Figure 3 have been inserted. 

Under this approach, the application program is de- 
signed to ignore information contained between these 
delimiters. How to design the programs to accomplish 
this is known in the art, once the idea is suggested. 

For the invention to use such a file structure, the 
invention reads the file, in a manner known in the art, 
and looks for the proper delimiter, which is AAAA in this 
case. The invention then looks for the NATURAL LAN- 
GU AG E block, as by searching for the code NL (see Fig- 
ure 5). When this code is found, the invention then reads 
a next predetermined number of subsequent characters 
to ascertain the natural language. 

Next, the invention looks for the encryption code, 
EN, in a similar manner, and proceeds until the delimiter 
ZZZZ is found. 

Thus, under this approach, a space is reserved 
within a header of a file (or elsewhere) into which a fourth 
party can insert information. The fourth party is, in this 
case, software associated with the invention. 

The party is fourth" because three other "parties" 
are involved in inserting data into a file: The application 
program is considered one party, because it generates 
the APPLICATION PROGRAM HEADER shown in Fig- 
ure 6. The user is considered a second party, because 
the user generates the FILE DATA. The operating sys- 
tem is considered a third party, because it inserts a sys- 
tem header (not shown herein). 

Third Approach 

It may not be feasible to expect manufacturers of 
application programs to allow user data to be inserted 
or appended as shown in Figure 6. In such a case, the 
invention can create an additional file, and associate it 
with each file created by the application program. Figure 
7 illustrates this association. 

The ORIGINAL FILE is that generated by an appli- 
cation program, such as a word-processing program 
which generated the file "When buying coconuts ..." as 
discussed in Figure 2. The invention generates an AUX- 
ILIARY FILE, which contains the information of blocks 
1 - 6 of Figure 3. 

The AUXILIARY FILE of Figure 7 can be created in 
numerous different ways, and it can conform to numer- 
ous different formats. Figure 7 shows a format similar to 
that of the ORIGINAL FILE, in that both contain SYS- 
TEM HEADERS and APPLICATION PROGRAM HEAD- 
ERS. Other formats are possible. 

For tracking purposes, the invention can give the 
AUXILIARY FILE an identical name as the ORIGINAL 



FILE, but with a slight modification. For example, if the 
ORIGINAL FILE is named "COCONUT.DOC", the in- 
vention can name the AUXILIARY HEADER as "COCO- 
NUT.AUX". 

5 The invention is expected to find significant use 
within a system described in the Related Applications. 
This system is available from AT&T Global Information 
Solutions Company, Dayton, Ohio, under the name 
"Continuous Learning System," or CLS. Briefly de- 
io scribed, CLS stores resources, such as computer files 
(which include data files, video files, audio files, and so 
on) at multiple different computer sites, which can be 
located throughout the world. 

A user, who can be located at any of the sites, or 

*5 linked to CLS via a communication channel, such as a 
cellular modem, can obtain a catalogue of all files con- 
tained in the system, to which the user is allowed ac- 
cess. The user can retrieve any file desired. 

When a user seeks to retrieve the ORIGINAL FILE 

20 shown in Figure 7, the invention also fetches the AUX- 
ILIARY FILE, reads the necessary information, and pro- 
ceeds accordingly. 

That is, when the ORIGINAL FILE is transmitted or 
copied, the invention always copies the AUXILIARY 

25 FILE, which contains the six blocks of data. Restated, 
the invention assures that, no matter where the ORIG- 
INAL FILE is transported, the AUXILIARY FILE is al- 
ways in association with it. 

For example, if a user, using CLS, retrieves the 

30 ORIGINAL FILE and stores it on a diskette, the inven- 
tion, without the user's knowledge, also retrieves the 
AUXILIARY FILE, and stores it on the same diskette. 
The invention may make the AUXILIARY FILE invisible 
to the user, so that the user does not see the AUXI LI ARY 

55 Fl LE when the user reads a directory of the diskette, but 
the information of the six blocks of Figure 3 are available 
for the systems of Figure 4. 

Recapitulation 

40 

Therefore, irrespective of the particular approach 
taken, such as that shown in Figure 3, 5, 6, or 7, the 
information of blocks 1-6 of Figure 3 is associated with 
the ORIGINAL FILE generated by the generating pro- 

^5 gram. The association may be accomplished by writing 
the data of the six blocks into a separate header of the 
file, writing the data into an existing header of the file, 
by creating an AUXILIARY FILE, or by other means. 
The six-block information and the ORIGINAL FILE 

so are stored in a repository. If a user seeks to retrieve the 
ORIGINAL FILE, the invention retrieves it, and also re- 
trieves the six-block information. The six-block informa- 
tion is maintained in association with the ORIGINAL 
FILE, after retrieval. (That is, within the repository, the 

55 ORIGINAL FILE and the AUXILIARY FILE are main- 
tained. Also, at the user's location, the ORIGINAL FILE 
and the AUXILIARY FILE are now maintained.) 

One scenario wherein the invention is to be used is 
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the following: Assume that a document exists in Italy, 
written in Italian, and generated by word processing pro- 
gram ABC. The invention creates the six blocks of infor- 
mation shown in Figure 3, and associates them with the 
file. 

If a user in the United States wishes to obtain the 
file, CLS will retrieve it. CLS maintains a profile for the 
user, which contains information as to the user's pre- 
ferred language, preferred word-processing program, 
and other information. 

The invention examines the six headers of the file, 
and learns that (a) the document is written in Italian, and 
(b) ABC word processor was used to generate it. (The 
other four headers will be ignored at present.) The in- 
vention examines the profile, and learns that the user 
prefers English, and prefers word processor XYZ. 

The invention then calls appropriate translation pro- 
grams, which translate the document from Italian into 
English, and from ABC format into XYZ format. The in- 
vention generates a new file, based on the selected file. 
The new file is ready for loading into the user's XYZ word 
processing program. 

The invention undertakes similar activities with re- 
spect to the other four blocks. 

Definitional Matters 

1 . When a file is created by an application program, 
such as a word-processing program, it contains infor- 
mation which is used by three, or more, agents: 

a) the system header, which is used by the operat- 
ing system (first agent); 

b) the application program's header, which is used 
by the application program (second agent); and 

c) the user data, which is used, or created, by the 
user (third agent). 

One or more of these agents are involved in normal ac- 
tivity involving the file. Normal activity includes 

a) creating the file (all agents are involved); 

b) normal file handling, such as copying, printing, 
displaying (first agent, or both first and second 
agents involved); and 

c) modifying the file (all agents involved). 

Since at least one agent is always involved in nor- 
mal file activity, these agents can be called "essential 
agents." 

In contrast, the agent which creates, or uses, the 
six blocks of data shown in Figure 3 is a non-essential 
agent. The file can be handled in its normal manner, (e. 
g., created or modified) without the involvement of this 
agent. 

Therefore, one definition of "essential agent" is an 
agent which is required in the normal file-handling ac- 
tivities of a file, or which is an application program which 



can load the file. 

2. The six blocks actually contain signals which are 
used by the systems of Figure 4 in calling other pro- 
grams which transform other data within the file. 

Numerous substitutions and modifications can be 
undertaken without departing from the true spirit and 
scope of the invention. What is desired to be secured 
by Letters Patent is the invention as defined in the fol- 
lowing claims. 



Claims 

1 . Means for generating a computer file, characterized 
15 in that said file comprises a data field and a header 

in which is stored identification information other 
than the information required in the presentation of 
any data stored in said data field on a computer dis- 
play screen. 

20 

2. Means according to claim 1 , characterized in that 
said identification information includes at least one 
of the following: 

25 i) natural language of the file; 

ii) type of encryption of the file; 

iii) type of compression of the file; 

iv) identity of the program which generated the 
file; 

30 v) identity of the operating system upon which 

the generating program ran; and 
vi) communication protocol information. 

3. A method of processing a computer file, comprising 
35 the following steps: 

a) maintaining, in association with the file, in- 
formation indicative of two or more of the fol- 
lowing: 

40 

i) natural language of the file; 

ii) type of encryption of the file; 

iii) type of compression of the file; 

iv) identity of the program which generated 
4$ the file; 

v) identity of the operating system upon 
which the generating program ran; and 

vi) communication protocol required by the 
file; 

50 

b) reading said information and calling respec- 
tive programs which process the file in accord- 
ance with the information. 
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